[Feature] Guided Decoding add LLguidance backend #5124

ST-XX · 2025-11-19T06:36:53Z

Motivation

This PR adds support for llguidance as a new backend for constrained decoding (structured generation).

Modifications

Dependency Integration: Added llguidance integration .txt
Backend Implementation: Implemented the wrapper/adapter for llguidance to interface with the inference engine.
Config Update: Added configuration options to select llguidance as the constrained decoding provider

Usage or Command

structured_outputs.md

Accuracy Tests

Validity Test: Verified that the generated output strictly adheres to the provided JSON Schema and Regex patterns.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-19T06:36:59Z

Thanks for your contribution!

kevincheng2 · 2025-11-20T08:12:39Z

fastdeploy/model_executor/guided_decoding/base_guided_decoding.py

+                and self.fd_config.structured_outputs_config.guided_decoding_backend is not None
+                and self.fd_config.structured_outputs_config.guided_decoding_backend == "guidance"
+            )
+            if not ErnieArchitectures.contains_ernie_arch(architectures) or is_guidance_backend:


现在一言所有模型的词表都可以使用 AutoTokenizer 加载吗？之前好像都会有问题

4.5的 22B 可以，0.3B 会挂掉。
不走这个 FastTokenizer 逻辑就会无法使用，尬住

fastdeploy/envs.py

codecov-commenter · 2025-11-21T04:45:37Z

Codecov Report

❌ Patch coverage is 78.10945% with 44 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@f25ee3a). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...model_executor/guided_decoding/guidance_backend.py	84.90%	17 Missing and 7 partials ⚠️
fastdeploy/config.py	0.00%	7 Missing ⚠️
...tdeploy/model_executor/guided_decoding/__init__.py	0.00%	6 Missing ⚠️
fastdeploy/lazy_loader.py	81.48%	5 Missing ⚠️
...l_executor/guided_decoding/base_guided_decoding.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5124   +/-   ##
==========================================
  Coverage           ?   60.02%           
==========================================
  Files              ?      319           
  Lines              ?    39010           
  Branches           ?     5883           
==========================================
  Hits               ?    23414           
  Misses             ?    13750           
  Partials           ?     1846

Flag	Coverage Δ
GPU	`60.02% <78.10%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

This PR adds support for llguidance as a new backend for constrained decoding (structured generation) in FastDeploy, enabling grammar-based constraints during token generation. This provides an alternative to the existing XGrammar backend.

Added llguidance backend implementation with processor, backend, and checker classes
Integrated llguidance into the configuration system with validation
Added comprehensive unit tests with mocking support for environments without llguidance

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 69 comments.

Show a summary per file

File	Description
fastdeploy/model_executor/guided_decoding/guidance_backend.py	Core implementation of LLGuidance backend, processor, and checker classes
fastdeploy/lazy_loader.py	New utility for lazy-loading modules to avoid pulling in heavy dependencies
fastdeploy/model_executor/guided_decoding/init.py	Factory integration for llguidance backend and checker
fastdeploy/model_executor/guided_decoding/base_guided_decoding.py	Added conditional logic to use HF tokenizer for guidance backend
fastdeploy/config.py	Configuration validation and import check for llguidance backend
fastdeploy/envs.py	Added environment variables for llguidance configuration
requirements_guided_decoding.txt	Added llguidance, torch dependencies
tests/model_executor/guided_decoding/test_guidance_*.py	Comprehensive unit tests with mocking support
docs/**/parameters.md	Updated parameter documentation to include guidance backend
docs/**/structured_outputs.md	Added llguidance backend to feature documentation
fastdeploy/model_executor/guided_decoding/xgrammar_backend.py	Removed max_rollback_tokens parameter

Comments suppressed due to low confidence (8)